10 - Deep Learning - Loss and Optimization Part 1 [ID:13706]
50 von 175 angezeigt

Welcome everybody to deep learning. So today we want to continue talking about the different

loss and optimization and we want to go ahead and talk a bit about more details of these

interesting problems.

We have a search technique which is just local search, gradient descent, to try to find a

program that is running on these recount networks such that it can solve some interesting problems

such as speech recognition or machine translation and something like that.

So let's talk first about the loss functions.

So loss functions are generally used for different tasks and for different tasks you have different

loss functions.

And the two most important ones that we are facing are regression and classification.

So in classification you want to estimate a discrete variable for every input.

So this means that you want to essentially decide in this two class problem here on the

left whether it's blue or red dots.

So you need to model a decision boundary.

In regression the idea is that you want to model a function that explains your data.

So you have some input function, let's say x2, and you want to predict x1 from it.

So you compute a function that will produce the appropriate value of x1 for any given

x2.

Here in this example you can see this is a line fit.

So we talked about activation functions, last activation, softmax, cross entropy loss and

somehow we combined them.

And obviously there's a difference between the last activation function in a network

and the loss function.

Because the last activation function is applied on the individual samples xm of each of the

batch and it will also be present at training and testing time.

So the last activation function will become part of the network and will remain there,

produces the output or the prediction and generally produces a vector of the world in

vector space.

But I think this is very difficult for people to understand.

They would not know what they're looking at.

Now the loss function combines all m samples and labels and in their combination they produce

a loss that describes how good the fit is.

So it's only present during the training time and this loss is generally a scalar value

you only needed during training time.

Interestingly many of those loss functions can be put in a probabilistic framework and

this leads us then to maximum likelihood estimation.

And maximum likelihood estimation just as a reminder we then consider everything to

be probabilistic.

So we have a set of observations capital X that consists of the individual observations.

Then we have the associated labels so they also stem from some distribution and the observations

are then y1 to ym.

And then of course we need a conditional probability density function that describes us somehow

how y and x are in relation.

In particular we can compute the probability for y given some observation x that will be

then very useful if you want to decide for a specific class for example.

Now we have to somehow model this data set.

They are given from some distribution so they are drawn for some distribution and the joint

probability for the given data set can then be produced as a product over the individual

conditional probabilities.

And of course if they're independent and identically distributed then you can simply write this

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:15:31 Min

Aufnahmedatum

2020-04-21

Hochgeladen am

2020-04-21 02:06:12

Sprache

en-US

Deep Learning - Loss and Optimization Part 1

This video explains how to derive L2 Loss and Cross-Entropy Loss from statistical assumptions. Highly relevant for the oral exam!

Video References:
Lex Fridman's Channel

Further Reading:
A gentle Introduction to Deep Learning

Tags

Optimization loss artificial intelligence deep learning machine learning pattern recognition Gradient descent
Einbetten
Wordpress FAU Plugin
iFrame
Teilen